SnappyData: Streaming, Transactions, and Interactive Analytics in a Unified Engine
نویسندگان
چکیده
In recent years, our customers have expressed frustration in the traditional approach of using a combination of disparate products to handle their streaming, transactional and analytical needs. The common practice of stitching heterogeneous environments in custom ways has caused enormous production woes by increasing development complexity and total cost of ownership. With SnappyData, an open source platform, we propose a unified engine for real-time operational analytics, delivering stream analytics, OLTP and OLAP in a single integrated solution. We realize this platform through a seamless integration of Apache Spark (as a big data computational engine) with GemFire (as an inmemory transactional store with scale-out SQL semantics). After presenting a few use case scenarios, we carefully study the challenges involved in marrying these two systems with drastically different design philosophies: Spark is a computational model designed for high-throughput analytics whereas GemFire is a transactional engine designed for low latency operations. Moreover, we find that even in-memory solutions are often incapable of delivering truly interactive analytics (i.e., a couple of seconds), when faced with large data volumes or high velocity streams. SnappyData therefore combines state-ofthe-art approximate query processing techniques and a variety of data synopses to ensure interactive analytics over both streaming and stored data. Through a novel concept of high-level accuracy contracts (HAC), SnappyData is the first to offer end users an intuitive means for expressing their accuracy requirements without overwhelming them with statistical concepts.
منابع مشابه
SnappyData: A Unified Cluster for Streaming, Transactions and Interactice Analytics
Many modern applications are a mixture of streaming, transactional and analytical workloads. However, traditional data platforms are each designed for supporting a specific type of workload. The lack of a single platform to support all these workloads has forced users to combine disparate products in custom ways. The common practice of stitching heterogeneous environments has caused enormous pr...
متن کاملExperience in Extending Query Engine for Continuous Analytics
Experience in Extending Query Engine for Continuous Analytics Qiming Chen, Meichun Hsu HP Laboratories HPL-2010-44 In-Database Stream Processing Combining data warehousing and stream processing technologies has great potential in offering low-latency data-intensive analytics. Unfortunately, such convergence has not been properly addressed so far. The current generation of stream processing sy...
متن کاملReal Time Analytics: Algorithms and Systems
Velocity is one of the 4 Vs commonly used to characterize Big Data [27]. In this regard, Forrester remarked the following in Q3 2014 [94]: “The high velocity, white-water flow of data from innumerable real-time data sources such as market data, Internet of Things, mobile, sensors, clickstream, and even transactions remain largely unnavigated by most firms. The opportunity to leverage streaming ...
متن کاملA Reference Architecture and Road map for Enabling E- commerce on Apache Spark
Apache Spark is an execution engine that besides working as an isolated distributed, in-memory computing engine also offers close integration with Hadoop’s distributed file system (HDFS). Apache Spark's underlying appeal is in providing a unified framework to create sophisticated applications involving workloads. It unifies multiple workloads, handles unstructured data very well and has easy-to...
متن کاملA standard Interactive Multimedia eBook Generator Engine for e-Learning Process
Introduction: Using standard authoring tools is essential to promote E-Learning in teaching-learning process. Learning content in medical sciences often consists of multimedia elements. On the other hand, it is frequently required to revise and update the medical content. Hence, access to the authoring tools that can encompass multimedia elements and allow easy content revision is helpful in e-...
متن کامل